Large Scale Spectral Clustering Using Resistance Distance and Spielman-Teng Solvers

نویسندگان

  • Nguyen Lu Dang Khoa
  • Sanjay Chawla
چکیده

Spectral clustering is a novel clustering method which can detect complex shapes of data clusters. However, it requires the eigen decomposition of the graph Laplacian matrix, which is proportion to O(n) and thus is not suitable for large scale systems. Recently, many methods have been proposed to accelerate the computational time of spectral clustering. These approximate methods usually involve sampling techniques by which a lot information of the original data may be lost. In this work, we propose a fast and accurate spectral clustering approach using an approximate commute time embedding, which is similar to the spectral embedding. The method does not require using any sampling technique and computing any eigenvector at all. Instead it uses random projection and a linear time solver to find the approximate embedding. The experiments in several synthetic and real datasets show that the proposed approach has better clustering quality and is faster than the state-of-the-art approximate spectral clustering methods. Keyword: spectral clustering, commute time embedding, random projection, linear time solver

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Work of Daniel A . Spielman

Dan Spielman has made groundbreaking contributions in theoretical computer science and mathematical programming and his work has profound connections to the study of polytopes and convex bodies, to error-correcting codes, expanders, and numerical analysis. Many of Spielman’s achievements came with a beautiful collaboration spanned over two decades with Shang-Hua Teng. This paper describes some ...

متن کامل

2 9 A ug 2 00 8 Spectral Sparsification of Graphs ∗

We prove that every graph can be approximated by a sparse (re-weighted) subgraph, called a spectral sparsifier. Our notion of approximation requires that the Laplacian quadratic form of the sparsifier approximate that of the original. This is equivalent to saying that the Laplacian of the sparsifier is a good preconditioner for the Laplacian of the original. We present an algorithm that produce...

متن کامل

ar X iv : 0 80 3 . 09 29 v 4 [ cs . D S ] 1 8 N ov 2 00 9 Graph Sparsification by Effective Resistances ∗

We present a nearly-linear time algorithm that produces high-quality spectral sparsifiers of weighted graphs. Given as input a weighted graph G = (V, E, w) and a parameter ! > 0, we produce a weighted subgraph H = (V, Ẽ, w̃) of G such that |Ẽ| = O(n log n/!) and for all vectors x ∈ R (1 − !) ∑ uv∈E (x(u) − x(v))2wuv ≤ ∑ uv∈Ẽ (x(u) − x(v))2w̃uv ≤ (1 + !) ∑ uv∈E (x(u) − x(v))2wuv. (1) This improves...

متن کامل

A partition-based algorithm for clustering large-scale software systems

Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...

متن کامل

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012